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Abstract 

This note exposes the differential topology and geometry underly- 
ing some of the basic phenomena of optimal transportation. It surveys 
basic questions concerning Monge maps and Kantorovich measures: 
existence and regularity of the former, uniqueness of the latter, and 
estimates for the dimension of its support, as well as the associated 
linear programming duality. It shows the answers to these questions 
concern the differential geometry and topology of the chosen trans- 
portation cost. It also establishes new connections — some heuristic 
and others rigorous — based on the properties of the cross-difference 
of this cost, and its Taylor expansion at the diagonal. 

1 Introduction 

What is optimal transportation? This subject, reviewed by Ambrosio and 
Gigli [5], McCann and Guillen [58], Rachev and Ruschendorf [69], and Villani 
[ST] [82] among others, has become a topic of much scrutiny in recent years, 
driven by applications both within and outside mathematics. However, the 
problem has also lead to the development of its own theory, in which a 
number of challenging questions arise and some fascinating answers have 
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been discovered. The present manuscript is intended to reveal some of the 
differential topology and geometry underlying these questions, their solution 
and variants, and give some novel and simple yet powerful heuristics for 
a few highlights from the literature that we survey. It attempts to frame 
the phenomenology of the subject, without delving deeply into many of the 
methodologies — both novel and standard — which are used to pursue it. 
The new heuristics are largely based on the properties of the cross-difference 
(Jl]), and its Taylor expansion ([6]) at the diagonal. 

Given Borel probability measures fi^ on complete separable metric spaces 
M ± , and a continuous bounded function c(x, y) representing the cost per unit 
mass transported from x G M + to y G M~ , the basic question is to correlate 
the measures /i + and /i~ so as to minimize the total transportation cost. In 
Monge's 1781 formulation [63], we seek to minimize 



cost(G) := / c(x,G(x)) d/j, (x) (1) 
Jm+ 

among all Borel maps G : M + — > M~ pushing fi + forward to fi~ = G#fi + , 
where the pushed-forward measure is defined by G#fi + (Y) = fi + (G^ 1 (Y)) 
for each Y C M~ . This question is interesting, because it leads to canonical 
ways to reparameterize one distribution of mass with another. When the 
probability measures are given by densities d^i ± {x) = f ± (x)dx on manifolds 
M ± , we can expect G to satisfy the Jacobian equation ± det[dG l / dx^} = 
f + (x)/f~(G(x)). Additional desirable properties of G can sometimes be 
guaranteed by a suitable choice of transportation cost; for example, G will be 
irrotational for the quadratic cost c(x, y) = ||x — y\ 2 on Euclidean space [9]. 
For subsequent purposes, we will often assume the cost c(x, y) and manifolds 
M ± to be smooth, but quite general otherwise. 

In Kantorovich's 1942 formulation, we seek to minimize 



cost(~f) := c(x,y)d'y(x,y) (2) 

Jm+xm- 

over all joint measures 7 > on M + x M~ having /x + and /x~ as marginals. 
The form of the latter problem — minimize the linear functional cost(^) on 
the convex set r(/! + ,/i~) := {7 > | 71^7 = /i 1 * 1 }, where ir + (x, y) = x and 
ir~(x,y) = y — makes it easy to show the Kantorovich infimum is attained. 
A result of Pratelli [67] following Ambrosio and Gangbo asserts that its value 
coincides with the Monge infimum 

min cost( / -f) = inf cost(G) (3) 
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if c is continuous and n + is free of atoms. However it is not straightfor- 
ward to establish uniqueness of the Kantorovich minimizer, nor whether the 
Monge infimum is attained, and if so, whether the mapping G which attains 
it is continuous. A sufficient condition (Al)' + for existence (and unique- 
ness) of optimizers G (and 7) was found by Gangbo [30] and Levin |44j . 
building on work of many authors, including Brenier, Caffarelli, Gangbo, 
McCann, Rachev and Riischendorf. When C R" are the closures of 
open domains, sufficient conditions for the existence of a smooth minimizer 
G : M + — > M~ were provided by Ma, Trudinger and Wang [53], building 
on work of Delanoe, Caffarelli, Urbas and Wang, and later refined through 
work of Delanoe, Figalli, Ge, Kim, Liu, Loeper, McCann, Rifford, Trudinger, 
Villani and Wang, among others. See Appendix |A] for a statement of their 
conditions (A0)'-(A4)'. At the same time, we introduce a new but equiv- 
alent formulation of conditions (A0)'-(A4)' in terms of the cross-difference 
01]), which emphasizes their purely topological (A0)-(A2) and geometric 
(A3)-(A4) nature, exposing their naturality and relevance. This process of 
reformulation, begun with Kim in [38], is completed here, as part of a series 
of questions and responses. 

2 Why do Kantorovich minimizers concen- 
trate on low-dimensional sets? 

Abstractly, one expects a linear functional cost(-f) on a convex set T(/i + , 
to attain its infimum at one of the extreme points. So it is interesting 
to understand the extreme points of T(fi + , yr). Such extreme points are 
sometimes called simplicial measures. Despite much progress, surveyed in 
[2], a characterization of simplicial measures in terms of their support has 
long remained elusive and is probably too much to hope for. Recall that 
a measure 7 > is simplicial if it is not the midpoint of any segment in 
r(7r^7, 7r_^7). Ahmad, Kim and McCann [2] showed each simplicial mea- 
sure 7 vanishes outside the union of a graph {(x,G(x)) \ x G M + } and 
an antigraph {(H(y),y) \ y G M~}, generalizing Hestir and Williams [35] 
result from the special case of Lebesgue measure fi^ on the unit interval 
M ± = [0, 1]. This shows 7 concentrates on a set whose topological dimen- 
sion should not exceed max{n + ,n"}, where n = dimM 1 * 1 . Taking n + < n~ 
without loss of generality, if the measure \x~ fills the space M~ , then 7 cannot 
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concentrate on any subset of lower dimension than n~ , so it would seem we 
have identified the topological dimension of the set on which 7 concentrates 
to be precisely n~ = max{n + ,n"}. Unfortunately, this simple argument is 
somewhat deceptive. Although the graph and antigraph of [35] and [2] enjoy 
further structure, they are not generally 7- measurable; a priori it is conceiv- 
able that their closures might actually fill the product space N = M + x M~ . 

With some assumptions on the topology of the cost function c and spaces 
M ± , it is possible to better estimate the size of support of the particular 
extreme points of interest using the more robust notion of Hausdorff dimen- 
sion. The basic object of geometrical relevance will be the support spt 7 of the 
Kantorovich optimizer, defined as the smallest closed subset S C M + x M + 
carrying the full mass of 7. If the Monge infimum is attained by a map 
G : M + — > M~ and the Kantorovich minimizer is unique, it will turn 
out that spt 7 agrees (7-a.e.) with the graph of G; when this map is a dif- 
feomorphism, then 7 concentrates on a subset of dimension n + = dimM + 
in M + x M~ . We shall show why this might be expected more generally, 
assuming M ± to be (smooth) manifolds henceforth. 

Setting N := M + x M~ and S = spt 7, consider the cross-difference [SB] 

S(x, y; x , y ) := c(x, y ) + c(x , y) - c(x, y) - c(x , y ) (4) 

defined on iV 2 . An observation — special cases of which date back to Monge 
- asserts 5(x, y; x , yo) > on S 2 C N 2 ; in other words, we cannot lower the 
cost by exchanging partners between (x,y) and (x ,y ); for a modern proof, 
see Gangbo and McCann [31]. This fact is called the c-monotonicity of 5*. 

If c G C 2 , then (xo, yo) G is a critical point for the function S°(x, y) := 
5(x, y; xo, yo), whose Hessian 

h = iHess5°(x ,?/o) (5) 

is then well-defined (though it need not be at points (x, y) 7^ (xo,yo) which 
are non-critical). Now for (x ,y ) G S, we have 5°(x, y) > on S, with 
equality at (xq, yo). On the other hand, the symmetries of the cross- difference 
S ensure that the Hessian h contributes the only non-vanishing term in the 
second order Taylor expansion of 5°: more explicitly 

5° (x + Ax, y + Ay) 

= h((Ax,Ay),(Ax,Ay)) + o(\Ax\ 2 + \Ay\ 2 ) (6) 

ra+ n~ 

= -J2J2 D X^,yo)Ax l Ay^ + o(\Ax\ 2 + \Ay\ 2 ) 
i=i j=i 
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as (Ax, Ay) -)■ 0. It is then not so surprising to discover that the Hessian 
h controls the geometry and dimension of the support of any Kantorovich 
optimizer 7 near (xo,yo) in various ways, as we now make precise following 
Pass [66J and my joint works with Kim [38], Pass and Warren [60] . 

Let (k + , ko, k~) be the (+,0,—) signature of h, meaning k + ,k ,k- G N 
count the number of positive, zero, and negative eigenvalues of h in one and 
hence any choice of coordinates. 

Claim 2.1 (Signature and rank) For each (xo,yo) G N, the signature of 
(EJ) is given by (k + ,ko,k-) = (k,n + + n_ — 2k, k) where k < min{n + ,n~} 
and n ± = dimM 1 * 1 . We may henceforth refer to the integer 2k as the rank of 
c G C 2 (and of h) at (xo,yo)/ it depends lower- semicontinuously on (xo,yo). 

Proof. The sum k + + k + k_ = n + + n_ must agree with the total 
dimension of N = M + x M~ . Since any perturbation direction (Ax, Ay) 
in which 5° grows, corresponds to another direction (—Ax, Ay) in which 5° 
shrinks flS}, it follows that k + = k-. Thus (k+, ko, kJ) = (k, n + + n_ — 2k, k). 

In fact, since the matrix h is symmetric, in any coordinate system we can 
find a basis of orthogonal eigenvectors for h. The preceding argument shows 
that if (Ax, Ay) is an eigenvector with eigenvalue A > then (—Ax, Ay) is an 
eigenvector with eigenvalue — A. In this case Ax = — ^ Y^=i ^l y j c ( x 0i 2/o)Ay J 
is determined by Ay and vise versa, so at most k < min{n + , n~} eigenvectors 
can correspond to positive eigenvalues [66] . 

Lowersemicontinuity of k = k(xo,yo) follows from the fact that c G C 2 . 

■ 

The Hessian h of the cross-difference also determines the spacelike, time- 
like, and lightlike cones £ + ,£~ and S° C Tr XOtyo \N according to the defini- 
tions £± = {V G T (xo , yo )N I ±h(V, V) > 0} and S° = £+ n E". 

Definition 2.2 (Spacelike, timelike, lightlike) A subsets C N is space- 
like if each (not necessarily continuous) curve t G [—1,1] 1 — > z(t) G S differ- 
entiable at t = satisfies h(z(0), i(0)) > 0, where z(0) is the tangent vector 
and h denotes the Hessian ([5]) at (xo,yo) = z(0). Similarly, S is timelike (or 
lightlike) if the inequality is reversed (or if both inequalities hold). 

Since we want to allow sets S which are rough and potentially incomplete, 
it is important to permit curves in the definition above whose continuity at 
t = may not extend to any neighbourhood of t = 0. 
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Lemma 2.3 (c-monotone implies spacelike) Any c-monotone set S C 
N is spacelike. 

Proof. Take any curve (x(t),y(t)) G S, not necessarily continuous but 
different iable at (xo,yo) = (x(0),y(0)), with tangent vector V = (x(0),y(0)). 
Since 5°(x(t), y(t)) > 0, from ©-® we see h(V, V) > 0, as desired. ■ 

Corollary 2.4 (Dimensional bounds) If c is C 2 and has rank 2k at a 
point (xq, yo) where S has a well-defined tangent space T , then c-monotonicity 
of S C N implies the dimension of this tangent space satisfies dimT < 
n+ + n_ — k. 

Proof. Fix coordinates on N. As a consequence of the (Courant-Fischer) 
min-max formula for eigenvalues of h at (xq, yo), the signature (k + , ho, k_) = 
(k, n + + n_ — 2k, k) of h limits the maximal number of linearly independent 
tangent vectors to N which are not timelike to k + + k = n + + n_ — k. Since 
the preceding lemma shows the tangent space T of S to be spanned by such 
a set of vectors, its dimension satisfies the asserted bound. ■ 

The following much stronger result of Pass [66j asserts S is contained 
in a spacelike Lipschitz submanifold of the prescribed dimension — hence 
implies differentiability a.e. as a consequence instead of a hypothesis. The 
case k = n + = n_ was worked out earlier by McCann, Pass and Warren [60J, 
by adapting an idea of Minty [HI] [3] from the special case c(x, y) = —x ■ y. 

Theorem 2.5 (Rectifiability |66| ) If c has rank 2k at (xo,yo) and is C 2 
nearby, then on a (possibly smaller) neighbourhood Nq C M + x M~ of 
(x , yo), c-monotonicity of S C N implies S C L where L C N is a spacelike 
Lipschitz submanifold of dimension dim L < n + + n_ — k with n± = dim . 

Idea of proof. A kernel of the proof can be apprehended already in the 
one-dimensional case n — 1. When c has rank zero, taking L — Nq implies 
the result, so assume c has full rank (2k = 2), meaning either d 2 c/dxdy < 
or d 2 c/dxdy > near (xo, yo)- I n the first case, c-monotonicity of S 
implies S (1R is contained in a non-decreasing subset of any sufficiently small 
two-dimensional rectangle R = B e (xo) x B e (yo). This monotonicity is well- 
known in both mathematical [51] and economic contexts [76] [62]. Rotating 
coordinates by setting u = (x + y)/ y/2 and v — (y — x)J V2, the monotonicity 
is equivalent to asserting that S is contained in the graph of {(u,V(u))} 
of a function v = V(u) with Lipschitz constant one. In the second case, 
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c-monotonicity would imply S H R is non-increasing, hence contained in a 
1-Lipschitz graph of u over v. 

The same argument carries over immediately to the bilinear cost c(x, y) = 
—x ■ y in higher dimensions n + = n~ [61]. For other costs with rank 
2k = 2n + = 2n~, one can make a similar argument after a linear change 
of coordinates x = x — Xq and y = A(y — y ) chosen so that in the new coor- 
dinates the cost takes the form c(x, y) = —x ■ y + o(Ax 2 + Ay 2 ) +/(£) + f(y) 
[60] . The cases k < min{n + , n - } and n + ^ n~ are worked out in [66]. ■ 

When the rank of c is maximal (i.e. k = min{n + ,n~}), then the dimen- 
sional bound is dimL < max{n + ,n"}. Taking n + < n~ without loss of 
generality, if the measure fi~ fills M~ (say, by being mutually absolutely 
continuous with respect to Lebesgue measure in any coordinate patch), the 
dimension of the Lipschitz submanifold L on which 7 concentrates cannot be 
less than n~ , in which case we see the bound given by the theorem is sharp: 
dimL = n~. 

Example 2.6 (Submodular costs on the line) // = R there is a 
unique measure in T(fi + , fi~) whose support S = spt 7 forms a non- decreasing 
subset of the plane. This measure is the unique minimizer of Kantorovich's 
problem for each cost c G C 1 (R 2 ) satisfying d 2 c/dxdy < 0; see e.g. 
I5by . Apart from at most countably many vertical segments, the set S is 
contained in the graph of some G : R — > RU {±00} non- decreasing. Unless 
fi + has atoms, the vertical segments in S are 7 negligible, in which case 
7 = (id x G)#ji + and Monge's infimum is attained uniquely by G. 

Example 2.7 (Transporting mass between spheres) Transporting mass 
on the surface of the earth has lead to consideration of the cost function 
c(x,y) = — y\ 2 restricted to the boundary of the unit sphere x,y G 
&Bi +1 (0) C R n+1 so that < c < 2 IT^f^JE^JS^, a problem consid- 
ered earlier in the context of shape recognition fSB^flf. The restricted cost 
has rank 2n except on the degenerate set c = 1, where it has rank 2n — 2. 
Thus any c-cyclically monotone subset S of the 2n- dimensional product space 
dB™ +1 (0) x <95™ +1 (0) has dimension at most n except along the degenerate 
set, where it has dimension at most n + 1 (in spite of the fact that the de- 
generate set is 2n — 1 dimensional) . Since the degenerate set separates the 
orientation preserving and orientation reversing parts S + and S~ of S, this 
means that S + cannot intersect S~ transversally ( except in dimension n = 1); 
instead, if S + meets S~ at a point where both have n-dimensional tangent 
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spaces, these spaces must have n — 1 directions in common. For example, if 
n = 3, and both S + and S~ are generically 3- dimensional, but their union is 
contained in a 4- dimensional Lipschitz submanifold, whereas the cost degen- 
erates on a smooth 5- dimensional hyper surf ace. 

In summary, c-monotonicity implies rectifiability of S = spt 7 C N = 
M + x M~ in a dimension determined locally by the rank of the Hessian h 
of the cross-difference 5° G C 2 ; moreover S must be spacelike with respect 
this Hessian (JSJ). If h is non-degenerate, we will eventually see that h can 
be viewed as a pseudo-metric on N whose Riemannian sectional curvatures 
combine with ^ to determine smoothness of S. 

3 When do optimal maps exist? 

We now turn to the more classical question of attainment of the infimum 
(J3D- To expect existence of Monge maps, we generally need /i + to be more 
than atom-free. We need n + not to concentrate positive mass on any lower 
dimensional submanifold of M + , or more precisely on any hypersurface pa- 
rameterized locally in coordinates as the graph of a difference of convex func- 
tions. This condition, proposed by Gangbo and McCann [51] . is sharp in a 
sense made precise by Gigli [33], and implies Lipschitz continuity and C 2 - 
rectifiability of the hypersurfaces in question. Absolute continuity of /i + in 
coordinates — i.e. the existence of a density f + such that dfi + (x) = f + (x)dx 
- is more than enough to guarantee this. However, we also require further 
structure of the transportation cost. 

For c G C 1 (N), the Gangbo [30] and Levin [H] criterion for existence of 
Monge solutions G : M + — > M~ given in Appendix |A] is equivalent to: 
(Al) + For each x$ G M + and y Q ^ yi G M~ , assume x G M + 1 — > 5°(x,yi) 
has no critical points, where 5°(x,yx) = 5(x,yi,x ,y ) is from (jlj). 
Naturally, this implies n + > n~, due to the fact we cannot generally hope 
to use a (rectifiable) map G on a low dimensional space to spread a mea- 
sure over a higher dimensional space. In fact, (Al) + implies something 
stronger: namely that every solution of the Kantorovich problem is a Monge 
solution. This in turn implies uniqueness of the Kantorovich (and hence 
Monge) solution, for the following reason. Suppose two Kantorovich solu- 
tions exist, and both correspond to Monge solutions: 70 = (id x Go)#A* + 
and 71 = (id x Gi)#/i + . Linearity of the Kantorovich problem implies 
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72 : = (7o+7i)/2 is again a solution, hence by (Al) + must concentrate on the 
graph of a map G : M + — > M~ . It is then easy to argue 7$ = (id x 
for i — 0, 1, 2 as in e.g. [2]. This implies 70 = 71; moreover Go = G = G± 
/i-a.e. Thus we arrive at the following theorem [30] jH] [2] |33j : 

Theorem 3.1 (Existence and uniqueness of optimal maps) Let ^ be 

probability measures on manifolds , with a cost c £ C 1 (M + x M~) which 
is bounded and satisfies (Al) + . If fi + assigns zero mass to each Lipschitz 
hypersurface in M + , then Kantorovich's minimum is uniquely attained, and 
the minimizer 7 = (id x G)#n + vanishes outside the graph of a map G solv- 
ing Monge's problem. In fact, not all Lipschitz hypersurfaces are required: 
it is enough that fi + vanish on each hypersurface locally parameterizable in 
coordinates as the graph of a difference of two convex functions. 

Notice (Al) + asserts the restriction of 5° to each horizontal fibre M + x 
{yi} has no critical points, except on the fibre y± = yo where 5° vanishes 
identically. To guarantee invertibility of the map G, we need the same con- 
dition to hold for the reflected cost c*(y,x) := c(x,y), meaning the roles of 
M + and M~ are interchanged. If both c and c* satisfy (Al) + , we say (Al) 
holds. Thus (Al) is equivalent to asserting that (xo,?/o) is the only critical 
point of 5°(x, y). 

Many interesting costs, such as c(x, y) = h(x — y) with h strictly convex 
or concave on M ± = R n satisfy these hypotheses. The most classical of 
these is the Euclidean distance squared [S] [IS] [IS] [13] • Regularity of the 
convex gradient map it induces, generalizing Example \2.Q\ was established 
by Delanoe [17] for n = 2 and Caffarelli [10] [11] and Urbas [80] for n > 3. 

Example 3.2 (Euclidean distance squared) // M ± C R n and /i + van- 
ishes on all hypersurfaces, there is a unique measure in T(fi + ,fi~) concen- 
trated on the graph of the gradient of a convex function u : R" — > RU{+oo}. 
This measure is the unique minimizer of Kantorovich's problem ([3]) for the 
cost c(x,y) = ||x — y\ 2 JEj If d^ = f ± dH n are both absolutely 

continuous with respect to Lesbesgue, the Monge- Ampere equation f + (x) = 
f~(Du(x)) det D 2 u(x) holds fi + -a.e. f53]/ . If moreover, log/ ± £ L°°(M ± ) 
with M~ convex and T-L n (dM + ) = 0, then u £ Cj^(M + ) for some a > ITty 
estimated in 12^ . If, in addition f^ 1 £ C 1 '? and M + and M~ are both smooth 
and strongly convex — meaning the principle curvatures of their boundaries 
are all strictly positive — then u £ C 2 ^(M + ) for all < (3 < (3 < 1 [T?|/ fli]/ 
Higher regularity follows from smoothness of f^ 1 . 
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On the other hand, (Al) + also fails for many interesting geometries. We 
mention two such examples. In the first — the cost function of interest to 
Monge (63] - optimal maps turn out to exist but are not unique. Their 
non-uniqueness was quantified with Feldman [22]. In the second, Monge's 
infimum turns out not to be attained, despite the fact that the Kantorovich 
minimizer is unique. 

Example 3.3 (Uniqueness fails for Monge's cost) Let open sets M ± C 
R n have finite volume andc(x,y) = \x — y\. Monge was originally interested 
in transporting the uniform measure /i ± = - Hn } M ±y )~l n from one domain to the 
other, whenn = 3 andl-L n denotes the n- dimensional Hausdorff 'measure, and 
coincides with Lebesgue measure in this case J6^. Taking M + disjoint from 
M~ ensures smoothness of c. Notice that when n = 1 and M + and M~ are 
disjoint intervals, every 7 e T(fi + ,fi~) has the same total cost cost (7). In 
this case the solution to Kantorovich 's problem is badly non-unique. Clearly 
(Al) + also fails in this case. In higher dimensions, the situation is slightly 
less degenerate since the cost takes a range of values on r(/i + ,/i~), but it 
remains true that its extrema are not uniquely attained. In this setting, it 
can be a difficult problem to show that Monge's infimum is attained. This 
problem was first solved by Sudakov in the plane n = 2; he asserted a result 
in all dimensions but it was later discovered that one of his claims sometimes 
fails if n > 2. This existence result was extended to higher dimensions by 
Evans and Gangbo, assuming ^ to be given by Lipschitz continuous densities 
on R n JHy, and for general absolutely continuous densities ^ by Ambrosio 
Trudinger-Wang JTFjj and Caffarelli-Feldman-McCann Jffij simultane- 
ously and independently. The last group also considered costs given by non- 
Euclidean norms, but with smooth and strongly convex unit balls, restrictions 
removed in a seqeunce of papers by different teams of authors including Am- 
brosio, Bernard, Buffoni, Bianchini, Caravenna, Kirchheim, and Pratelli, 
and culminating in work of Champion and DePascale fl3] /. 

On the other hand, if M + is a compact manifold without boundary, it is 
evident that x G M + 1 — > S°(x,yi) must attain at least one maximum and 
one minimum so that — as long as the cost is assumed different iable — it 
is clear that (Al) + cannot be satisfied. In this case, it will not always be 
true that Monge's infimum ([3]) is attained, as my examples with Gangbo [32] 
show: 
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Example 3.4 (Transporting mass between spheres, revisited) Restrict 
c(x,y) = \\x - y\ 2 to M ± = dB^O) C R n+1 so that < c < 2, as in Ex- 
ample \2. 7\ Take // ± to be mutually absolutely continuous with respect to 
surface area T-L n on their respective spheres, but take most of the mass of 
/i + to be concentrated near the north pole and most of the mass of /i~ to 
be concentrated near the south pole. Then Monge 's infimum Q will not be 
attained, despite the fact that the Kantorovich minimizer 7 is unique. The 
intersection of S = spt 7 with the set c < 1 is contained in the graph of a 
map G : M + — > M~ , while the intersection S D {c > 1} is contained in 
the graph of a map H : M~ — > M + - sometimes called an antigraph. If 
the densities f ± = d/i ± /d'H n are both bounded, so that log/ 1 * 1 G L°° , then 
G is a homeomorphism of dBi and H may be taken to be continuous J5^ /, - 
both maps enjoy a local Holder exponent of continuity a = l/(4n — 1) except 
possibly where their graphs touch the set {c = 1} where the rank of c drops 
from 2n to 2n — 2 [59]. It may be possible to improve this Holder exponent 
to a = l/(2n — 1) using techniques of Liu [J^6], but even when f^ are smooth 
we have no idea how to prove G will be smoother, nor how to extend Holder 
continuity of G up to the degenerate set {c = 1}. 

Notice that global differentiability of the cost is crucial to this discus- 
sion. For costs whose differentiability fails — even on a small set such as 
the Riemannian cut locus — the theorem which follows gives many natural 
examples where existence and uniqueness both hold. 

Theorem 3.5 (Minimizing Riemannian distance squared) Letc(x,y) = 
d 2 (x,y)/2 be the square distance induced by some Riemannian metric on a 
compact manifold M + = M~ . If fi + is absolutely continuous (with respect to 
Riemannian volume) then the Kantorovich minimizer is unique in ()3]), and 



takes the form 7 = (id x G)#fi + for a map solving Monge's problem ,57'. 
In case M ± are round spheres JTffi (or quotients flE/, submersions [B 7 ^ or 
products thereof JM$), and both ^ are given by smooth positive densities with 
respect to surface area, then the map G will be a smooth diffeomorphism. 

Notice that the existence and uniqueness asserted in Theorem 13.51 is not 
quite a corollary of Theorem 13. 1[ since compactness of the manifold M ± 
forces the cut-locus to be non-trivial. Here the cut-locus is defined as (the 
closure of) the set of points where differentiability of the cost c = d 2 /2 fails. 
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4 When are optimal measures unique? 



The preceding section shows that if the cross-difference S°(x, y) = S(x, y; xo, yo) 
has no critical points unless x = xq or y = yo, then Monge's problem is sol- 
uble and the Kantorovich problem admits a unique solution. Although very 
useful when it applies, this criterion is not satisfied in all cases of interest. 
- for example, when trying to minimize the restriction of the quadratic cost 
c(x,y) = \x — y\ 2 /2 to the Euclidean unit sphere = dBi(0) C R n+1 . In 
such situations, my results with Chiappori, Nesheim [H], Ahmad and Kim 
[2] may be useful: 

Theorem 4.1 (Uniqueness of minimizer for subtwisted costs) Fix Borel 
probability measures / u ± on manifolds , with fi + vanishing on each hyper- 
surface in M + , and a bounded cost c G C 1 (M + x M~). Suppose for each 
xo G M + and yo ^ y\ G M~ , the cross-difference S°(x,y) := S(x, y; xo, yo) 
from 01]) satisfies 

, , . r n i \ has at most two critical points, namely, a unique 
x G M i — > 5"(x,yi) , , . . . , . i i t ■ 

global minimum and a unique global maximum. 

(7) 

Then the Kantorovich problem has a unique solution, and it takes the form 
7 = (id x G)#/j, + (H x id)#(/j,~ — G#fi) for some maps G : M + — y M~ 
and H : M~ — y M + and non-negative measure \x < fi + such that pT — G#n 
vanishes on the range of G. 

In other words, the unique Kantorovich solution concentrates on the union 
of the graph and an antigraph, of G : M + — y M~ and of H : M~ — y 
M + respectively. Notice that if the manifold M + is compact, hypothesis 
(J7|) restricts its Morse structure to be that of the sphere, so the theorem 
generalizes of Example 13.41 However apart from the continuity results of [32] 
[SSj and [?] , it is not known when G and H can be expected to be smooth. It 
is even more shocking that no criterion analogous to Theorem 14.11 is known 
which guarantees uniqueness of Kantorovich minimizer on the torus — or 
indeed on any other compact manifolds M ± apart from the sphere. 

5 When are optimal maps continuous? Smooth? 

Examples I3.2[ 13.41 and Theorem 13.51 complement Theorems 12.51 and 13.11 by 
providing a variety of settings where the optimal map G is continuous and/or 
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support of the optimal measure can actually be shown to be smooth. In each 
case, we need the cost to be suitable, the domain geometry to be favorable, 
and the measures to be positive, bounded and possibly smooth. 

Following the analysis of a number of such examples, including the re- 
striction of c(x, y) = — log \x — y\ to the unit sphere (83] [M], a general theory 
for addressing such questions has begun to be developed, starting from the 
pioneering work of Ma, Trudinger and Wang |53j , who identified conditions 
on the transportation cost c which are close to being necessary and sufficient 
for smoothness of G. Their work is set on bounded domains M C R n , and 
as we now explain, each of their conditions can be reformulated in terms of 
the topology and geometry of the cross-difference 5°(x,y) = 5(x,y; x ,y ) 
from (jlj) and its Hessian h = |Hess( XOj2/0 )<5°. 

Where c has full rank 2n, the Hessian h is non- degenerate and can be 
understood as a pseudo-Riemannian metric tensor on the product space. Ac- 
cording to Claim 12.11 this pseudo-metric tensor is not positive definite, but 
instead has the same number of spacelike and timelike dimensions. At each 
point point (xo, yo) G N, the light-cone separating these spacelike from time- 
like directions consists of the tangent spaces to {xo} x M~ and M + x {yo}. 
However, just as in Riemannian (and Lorentzian) geometry, the pseudo- 
metric tensor h induces a geometry on the product space N = M + x M~ , 
including geodesies and a pseudo-Riemannian curvature tensor Ri>j>k'i', which 
assigns sectional curvature 

sec™ o) P A Q = Yl Ri'fwP i 'Q j 'P k 'Q 1 ' 

l<i',j',k',l'<2n 

to each pair of vectors P,Q G T XOtVO N. The explicit formulae expressing 
geodesies and the curvature tensor (Tl2|) in terms of h can be found in [38] or 
deduced from Appendix [A] they are precisely analogous to the Riemannian 
case. 

In terms these notions, we may now state conditions equivalent to those 
of Ma, Trudinger and Wang (A1) / -(A4) / found in Appendix lAl below: 

(AO) c G C 4 (A), and for each (x , y ) G N = M + x M~ C R n x R n : 

(Al) (x,y) 6J?4 S°(x,y) from (jlj) has no critical points save (xo,yo); 

(A2) c has rank 2n, so h =Hess( XOi?/0 )5 defines a pseudo-metric tensor; 

(A3) sec\^ h J o) (p © 0) A (0 © q) > for each lightlike (p, q) G T (xo>yo) N; 
(A4) the sets {xq} x M~ and M + x {y } are /i-geodesically convex. 
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Here a subset Z C N is said to be h-geodesically convex if each pair of points 
(x ,2/o) an d G Z can be joined by an geodesic in TV lying entirely 

within Z, geodesies being defined relative to the pseudo-metric h. 

The most intriguing of these conditions is the curvature condition (A3). 
A large body of example costs which satisfy [53] [35] [T8] [38] [27] [28] p] 
[4"T] [12] P2] or violate it [53] [IE] have now been established. Among the 
former we may mention the restriction of the Euclidean distance squared 
to the graphs M ± C R n+1 of any pair of 1-Lipschitz convex functions [53] . 
as well as the Riemannian distance squared on the round sphere [19], and 
any products [38], submersions [38] or perturbations [18] [27] [28] thereof. 
Among the latter we may mention the Riemannian distance squared on any 
manifold (M,gij) with a non-negative sectional curvature somewhere [38] , 
and the restriction of the Euclidean distance squared to the graphs of two 
functions in R n+1 , one of which is convex and the other non-convex [53] . 
Thus the distance squared in hyperbolic space c = d^n violates (A3), though 
c = — cosh <i H ™ satisfies it [15] [12] . 

To conclude continuity or higher regularity of G at present requires a 
slight strengthening of one of the geometric conditions (A3) or (A4). If the 
inequality in (A3) holds strictly whenever the /i-orthogonal vectors p®0 and 
© q are no n- vanishing, we denote that by (A3) s . If instead the geodesic 
convexity of the sets in ( A4) is strong (i.e. 2-uniform, in the sense of Example 
13.21 or Appendix |SJ), we denote that by (A4) s . Under these assumptions the 
following extensions of Theorem 13.11 and Example 13.21 have been proved, in 
works of Ma, Trudinger, Wang, Loeper, Liu, Figalli, Kim and myself. 

Theorem 5.1 (Continuity and smoothness of optimal maps) Assume 
(A0)-(A4) hold, andd^ = f ± d'H n are given by densities satisfying log f ± G 
L°°(f/ ± ) with U~ = M~ and U+ C M+ open, (a) If (A3) B holds, the map 
G G C" oc (U + , M~) is Holder continuous JTffi , with an exponent a = l/(2n— 1) 
known to be sharp [Jffi . (b) If (A3) s fails but (A4) s holds, the same conclu- 
sion persists but with an unknown exponent a independent of c, but presumed 
to depend on || \og(f + /f~)\\ L o ( U + xM -y Either way, higher interior regularity 
of G follows from smoothness of f ± [fflj . If, = M ± and f ± are smooth in 
case (b), the smoothness of G shown in 153] / extends up to the boundary J7P| /. 

It is possible to construct smooth bounded f ± for which continuity of G 
fails in the absence of either (A3) or (A4) as was done by Loeper [H] and 
by Ma, Trudinger and Wang [53] respectively. Still, there are few results 
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quantifying the discontinuities of G, except for the cost c(x,y) = ~\x — y\ 2 
of Example 13.21 [85] [23] [21] , for which examples of discontinuous maps go 
back to Caffarelli [TO] . 

6 Closed forms and c-cyclical monotonicity 

The sections above have discussed many necessary conditions for optimal- 
ity of 7, but few sufficient conditions. In fact, for bounded continuous 
c £ C(M + x M~), a condition on the support S = spt7 well-known to 
be necessary and sufficient for optimality in r(7r^7, 71^7) is given by: 

Definition 6.1 (c-cyclical monotonicity) A set S C M + x M~ is c- 

cyclically monotone if and only if each k £ N, sequence (xi,yi), . . . , (xk, yu) £ 
S, and permutation r on k letters satisfy the following inequality: 

k k 
^2c(xi,yi) < 53c(x T (<),yi). (8) 

2=1 1=1 

This result can be found in Pratelli [68J or Schachermayer-Teichmann [71] , 
building on earlier works of Knott-Smith, Gangbo-McCann, Riischendorf, 
and Ambrosio-Pratelli. The case k = 2 corresponds to the c-monotonicity 
condition which implies that S is /i-spacelike. The result quoted above shows 
the cross- difference 5(x, y; xq, yo) is just the first in an infinite sequence of 
functions whose non-negativity on S k for each k £ N characterizes optimality 
of 7. In fact, since all permutations are made up of cycles, for each k it is 
enough to check ([8]) for the cyclic permutation t(i) — i + 1 if i < k with 
r(k) = 1. This family of conditions has a differential topological content 
whose relevance we now try to make clear. 

Choose any map G : U + C M + — > M~ defined on a subset U + C M~ , 
whose graph lies inside S. Any different iable loop a : S 1 — > M + may be 
approximated by Xi = a{6i) for a partition < 61 < ■ • ■ < 6k < 2n as fine as 
we please. The non-negative sums (jHJ) then approximate Riemann sums for 
the integral 

0< / D x c(a(9),G(a(9)))-a'(e)de 
Jo 

arbitrarily closely. If the form x £ U + 1 — > D x c(x, G(x)) is continuous on an 
open set U + C M + containing a, then the Riemann integral exists. Since 
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the curve can be traversed in either direction, the non-negative integral must 
actually vanish, hence the form must be closed: for U + simply connected, 
there would exist u £ C} oc {U + ) such that D x c(x, G(x)) = Du(x). Similarly, if 
G could be continuously inverted on a simply connected domain U~ C M~, 
there would exist v £ C} oc {U~) such that D y c(G _1 (?/), y) = Dv(y). These 
suppositions are not so implausible when (A1)-(A2) hold, since S at least 
coincides with the graph of a map G and has a well-defined tangent space 
"H n -almost everywhere. 

However, despite the fact that neither G nor its inverse will be continuous 
in general, some vestige of this integrability persists. If c is Lipschitz contin- 
uous for example, then ([S]) implies the existence of Lipschitz u, v such that 
c(x, y) —u(x) —v(y) > on N = M + x M~ with equality holding throughout 
S. This fact, which goes back to [72] [71], is in many senses better than mere 
integrability of a form: it requires no topology restriction on the domains, 
and not only do we get the first-order condition Du(x) = D x c(x, y) for those 
points (x, y) £ S with x in the set of W 1 full measure Dom Du where u 
is different iable; as a second-order condition we get positive-definiteness of 
the matrix Dl x c(x,y) — D 2 u(x) > if x £ DomD 2 u, and analogous condi- 
tions for v. Verily is S contained in the gradient of a convex function when 
c(x, y) = —x ■ y or c(x, y) — ~\x — y\ 2 on U^ 1 C R™. 

As Gangbo and McCann argue [31], this rough integrability result of 
Rockafellar and Rochet implies the famous duality of Kantorovich [SB] , Koop- 
mans and Beckmann [3D]: 

min / c(x,y)d , y(x,y) = sup 

r(M + ,M") Jm+xM- (u+,u-)£Lip c 

with the supremum over 

Lip c := {u* 1 £ L 1 (dfi + ) | c(x,y) > u + {x) + u~(y) throughout iV} (10) 

being attained at (u + ,u~) = (u,v). Indeed, for any £ Lip c , inte- 

grating the inequality f lTU]) against 7 £ r(/i + ,/i~) yields 

/ cd'j > / u + d/i + + / vTdiT. (11) 

JM+xM- JM+ JM- 

Thus the min dominates the sup in (Q. Starting from 7 £ T(/i + , fi~) with c- 
cyclically monotone support, Rochet's generalization of Rockafellar's theorem 
provides (u + ,u~) = (u,v) £ Lip c — bounded and Lipschitz if c is — such 
that equality holds in ffTTj) . and hence in ([§]) as desired. 



u + dfi + + / u dfi (9) 



m- 
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7 Connections to differential geometry 



We have already seen that the pseudo-Riemannian geometry induced on the 
product space N = M + x M~ by the metric tensor h = | Hess 8°(xq, y ) 
plays a key role in determining whether or not maps y = G(x) which solve 
Monge's problem (CQ) are smooth. Here h is the Hessian of the cross- difference 
(111)-© associated to the cost c. The antisymmetry 

S(x, y; x , y ) = 5(x , y ; x, y) = -S(x, y ; x , y) 

ensures that h vanishes onnxn diagonal blocks. The involution U (Ax, Ay) = 
(Ax, —Ay) on T^ XQ ^N allows us to define an antisymmetrized analog of h 
by 

co(P,Q) = h(P,U(Q)). 

Here u turns out to be a symplectic form if and only if h has the full rank 
2n = 2n ± that we often assume. Notice the similarity to Kahler geometry, 
with the splitting T^^N = T XQ M + @T yo M~ of the tangent space associated 
to U playing the role of the almost complex structure J, and the cost c 
playing the role of the Kahler potential. For geometric measure theory in 
such geometries see Harvey and Lawson [31] . 

Kim and McCann showed that any c-optimal diffeomorphism G : M + — > 
M~ has a graph which is w-Lagrangian in addition to being /i-spacelike. 
Conversely, when (A0)-(A4) hold, then any diffeomorphism with an u- 
Lagrangian and /i-spacelike graph is necessarily c-optimal [35] • Here a sub- 
manifold S C N is called w-Lagrangian if u(P, Q) = for every pair of 
tangent vectors P,Q G T( XoM )N. Being w-Lagrangian is essentially the in- 
tegrability condition which asserts closure of the form D x c\( X) g(x)) on M + ; it 
amounts to equality of the cross-derivatives dG l jdx^ = dG^ jdx 1 which imply 
the existence of u such that G[x) = Du(x) in case c(x, y) = —x ■ y. 

So far these geometric structures — the pseudo-metric h, symplectic form 
u, c-cyclical monotonicity, and c-optimality — reflect only the cost function 
c(x, y), and not the densities dn ± (x) = f ± (x)dx. Remarkably, however, there 
is a conformally equivalent pseudo-metric 

w v ( f + (xo)f-(y ) \ 1/n 

for which the graph Graph(G) of an optimal mapping = \i~ turns 

out to be a zero mean curvature surface — and in fact h- volume maximizing 
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among homologous surfaces. This surprising connection of optimal trans- 
portation to geometric measure theory was discovered with Kim and Warren 
[39]. 

Thus the properties of optimal maps relate to both sectional and mean 
curvatures with respect to h. On the other hand, in the special case of the 
quadratic cost c = d 2 on a Riemannian manifold M = M^, several surprising 
connections relate optimal transportation to the Riemannian geometry of 
(M,gij). For example, in this case Loeper and Villani conjecture [50] 
and in some cases have proved — (A3) s implies convexity of the tangent 
injectivity locus, which is to say the cut locus of each given point xo G M, 
lifted to the tangent space T X0 M + by the Riemannian exponential exp" 1 . 

An earlier development involved lifting the metrical distance d from M to 
the space P(M) of Borel probability measures G P(M) using the minimal 
transportation cost <i 2 (/i + , = y/ costly) with respect to distance squared 
c = d 2 [6] [20] [51]. Geodesic convexity of various entropy functionals on 
P(M) turns out to be equivalent to Ricci non- negativity of (M,g). This 
was shown by von Renesse and Sturm [70], building on work of myself [55] , 
Cordero-Erausquin, Schmuckenschlager and I [15], and Otto and Villani [65J. 
This idea was turned on its head by Lott- Villani [52] and independently 
Sturm [77], who used geodesic convexity of the same entropies to define Ricci 
non-negativity in (not necessarily smooth) metric-measure spaces. This non- 
negativity is stable under measured Gromov-Hausdorff convergence, and has 
significant consequences. 



A Ma-Trudinger-Wang conditions 

The conditions (A0)-(A4) above have been synthesized in a language se- 
lected to manifest their topological and geometric invariance — aspects not 
readily apparent [7J from the original formulation by Ma, Trudinger, and 
Wang [53] in coordinates on the bounded sets M ± C R n , as we now recall. 

Use subscripts such as i and j to denote derivatives with respect to x % 
and y j , and commas to separate derivatives in M + from those in M~ , so 
that = d 2 c/dx l dy j and CijM = d i c/dx l dx j dy k y l , etc. Also let c k)l denote 
the matrix inverse of c it j, and let D x c(x,y) = (ci, c 2 , . . . , c n )(x, y). Then 
the original conditions of Ma, Trudinger and Wang were formulated as the 
existence of a constant Cq > such that: 

(AO)' c G C 4 (N), and for each (x , y Q ) G N = M + x M~ C R n x R n : 
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(Al) + the map y G M~ i — > D x c(x ,y) G T* Q M + is injective; 
(Al)' both c(x, y) and c*(y,x) := c(x,y) satisfy (AO)' and (Al)' + ; 
(A2)' det Cij(x ,y ) ± 0; 

(A3):. {-Cij,ki + c ijim c m ' n Cki. n )p i q j p k q l > Co|p| 2 |g| 2 whenever p i c id q j = 0; 
(A4)' the sets D x c(x ,M~) C R n and D y c(M + ,y ) C R n are convex. 

Here the Einstein summation convention is in effect, and \p\ and \q\ denote 
the Euclidean norm on p G T Xo M + and q G T yo M~ C R n respectively. 

Their method is heavily based on a priori C 2 estimates, which require a 
maximum principle for the directional second derivatives D pp u := u^pp of 
the unknown maximizers G C{M ±S ) for the dual problem ([9]). A second- 
order linear elliptic equation satisfied by D 2 p u is obtained by twice differ- 
entiating the prescribed Jacobian equation for the map G, which is a fully 
nonlinear Monge- Ampere type equation for the potential u = u + . Condition 
(A3) s ' ensures the zeroth order term in the elliptic equation satisfied by D pp u 
has a coefficient with the correct sign to admit a maximum principle. 

The relaxation (A3)' of Co > to Co = and strengthening (A4) s ' which 
requires all principal curvatures of D x c(xo, M~) and D y c(M + , y ) to be posi- 
tive was introduced in the subsequent investigation of boundary regularity by 
Trudinger and Wang [79] . We leave it as an exercise to the reader to confirm 
the equivalence of each primed hypothesis (A0)'-(A4)' and their variants to 
the corresponding unprimed hypothesis in the text. The connection of these 
conditions to the Riemann curvature tensor 

sec K> © 0) A (0 © g) = (-c ijM + c lhm c m > n c kl , n )p l q3p k q l (12) 

and geodesic equations for the pseudo-metric h = |Hess( XOiyo )5° was first 
discovered in my joint work with Kim [38]. However, the link to the cross- 
difference 5°(x, y) originates in the present work. 
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